-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent internal usage of expensive APIs #10263
Prevent internal usage of expensive APIs #10263
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10263 +/- ##
================================================
+ Coverage 10.42% 10.66% +0.23%
================================================
Files 119 122 +3
Lines 20603 20898 +295
================================================
+ Hits 2148 2228 +80
- Misses 18455 18670 +215
Continue to review full report at Codecov.
|
This is cool! I'm wondering about its applicability beyond |
That's a great question. I don't anticipate this being terribly broadly applicable, but there are enough use cases for it to be valuable I think. A couple that come to mind:
It's possible (I would even say likely...) that there are other slow APIs that we just aren't aware of right now and might turn up upon more careful benchmarking, or APIs that seem unavoidable internally but might actually be more avoidable than we think. For instance, could we get rid of |
Another common one is use of |
16182bf
to
d7cc9cb
Compare
Setting the env var for devs could be done with conda env.yml file: |
Thanks for the find @quasiben. I ended up going with a pytest-based approach so that tests will always be run this way. |
f3e394f
to
3ea53d8
Compare
… and different test file locations.
064b1f1
to
05dc395
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Note for the future: I prefer to review PRs like this with an example for context. I would have appreciated seeing this applied to a single function so that I could better rationalize about the design. I saw your comment about it being tested and wanting to keep the changeset clean, but as a reviewer I can't prove that this works as claimed unless it is used at least once.
As a middle ground of "clean changeset" and "design in context," you could include a code example in the PR description or in a comment instead of in the changeset. For this specific case, the decorator is fairly trivial, though, so not worth worrying about an example.
Unfortunately you're just a bit late to the game. This branch originally did contain an example of applying this to |
Haha, that's fine then! I didn't know the context, my apologies. I definitely feel confident enough in the PR to count my approval here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
@gpucibot merge |
This PR uses the feature introduced in #10263 to remove the usage of DataFrame.columns where possible. Currently this removes a large number of internal uses but not all of them to verify that CI does indeed fail as expected since some changes were made on #10263 after the last test of CI by reviewer request. I will convert this from a draft once I see CI fail and once all the necessary changes have been made for tests to pass. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Michael Wang (https://github.com/isVoid) URL: #10315
This PR introduces a decorator that can be used to mark APIs that are unnecessarily expensive and should be avoided in internal usage. The decorator does nothing by default, but when the corresponding environment variable is set the decorator wraps decorated functions in a check of the current call stack. If the caller of that API is within the cudf source tree, it will raise an exception that may optionally also indicate an alternate API for developers to use instead. This implementation is completely transparent to users (including having no performance impact aside from some negligible additional import time from applying the decorator) but can be an invaluable tool to developers seeking to reduce Python overheads in cuDF Python. This decorator has been tested and shown to work (both locally and on CI) for
DataFrame.columns
, but those changes are not included in this PR in order to keep the changeset clean.